Data on acoustic phonetic properties of non-coronal fricatives in monosyllabic words of Zhongjiang Chinese

The data reported in this article are non-coronal fricative measurements from 10 (5 male; 5 female) native speakers of Zhongjiang Chinese. Each speaker produced 10 repetitions of 90 monosyllabic words beginning with either a velar fricative, /x/, or a labial-dental fricative, /f/. The measurements reported include spectral properties often used to characterize fricative variation, including: spectrum center of gravity (CoG), spectrum standard deviation (SD), spectrum skew, spectrum kurtosis, maximum amplitude frequency, and maximum amplitude. These measurements are compared across two data filtering conditions: a high pass filter condition, in which a 300Hz high pass filter was applied to the data before spectral measurements were calculated, and a no filter condition. The 90 monosyllabic words include the target fricatives in different phonetic environments. Target words include some that historically derive from different fricatives and show variation across regional varieties of Mandarin Chinese. Subsets of the target materials enable several closely matched comparisons of items. We describe measurements across the whole dataset, comparing as well the effect that filtering has on the measurements. The data also include a CSV file with measurements of each token, which enables comparison of phonetic contexts, lexical effects and individual differences in fricative variation beyond those described here. For further discussion of the data, please refer to the full length article entitled “The role of gestural timing in non-coronal fricative mergers in Southwestern Mandarin: acoustic evidence from a dialect island. Journal of Phonetics” [6].


a b s t r a c t
The data reported in this article are non-coronal fricative measurements from 10 (5 male; 5 female) native speakers of Zhongjiang Chinese. Each speaker produced 10 repetitions of 90 monosyllabic words beginning with either a velar fricative, /x/, or a labial-dental fricative, /f/. The measurements reported include spectral properties often used to characterize fricative variation, including: spectrum center of gravity (CoG), spectrum standard deviation (SD), spectrum skew, spectrum kurtosis, maximum amplitude frequency, and maximum amplitude. These measurements are compared across two data filtering conditions: a high pass filter condition, in which a 300Hz high pass filter was applied to the data before spectral measurements were calculated, and a no filter condition. The 90 monosyllabic words include the target fricatives in different phonetic environments. Target words include some that historically derive from different fricatives and show variation across regional varieties of Mandarin Chinese. Subsets of the target materials enable several closely matched comparisons of items. We describe measurements across the whole dataset, comparing as well the effect that filtering has on the measurements. The data also include a CSV file with measurements of each token, which enables comparison of phonetic contexts, lexical effects and individual differences in fricative variation beyond those described here. For further discussion of the data, please refer to the full length article entitled "The role of gestural timing in noncoronal fricative mergers in Southwestern Mandarin: acoustic evidence from a dialect island. Journal of Phonetics " [6] .  Table   Subject Linguistics Specific subject area Phonetics, dialect variation, and sound change Type of data Tables, figures, spreadsheet How data were acquired Acoustic measurements based on speech recorded in a studio setting. Data format Analysed Parameters for data collection Native speakers of the Zhongjiang dialect of Chinese were selected for participation. To minimize environmental noise, participants were recorded in a sound-attenuated room. Data were recorded at a high sampling rate to allow analysis of higher frequencies (greater than 8 KHz) characteristic of fricatives. Description of data collection The data in this paper were collected as part of a longer recording session, following standard protocol for naturalistic speech data elicitation.

Value of the Data
• Data reveal patterns of synchronic variation in fricative production that can be related to variation and change in language communities across Southwest China. • Data are useful to phoneticians studying speech production and to historical linguists, sociolinguists and linguistic anthropologists studying language variation and change in China. • Data can be used to test new hypotheses about non-coronal fricative variation, including which phonetic dimensions differentiate velar and labiodental fricatives and how these dimensions vary across contexts. • Data provide a historical record of synchronic speech variation in Zhongjiang China, a dialect island, which may be in the process of assimilating under the influence of other language varieties. • Data can be used to evaluate the effect of a high pass filter on common fricative measurements.

Data Description
The data presented in this article illustrate phonetic properties of the velar and labiodental fricatives in Zhongjiang Chinese. Some studies reporting spectral properties of fricatives apply a high pass filter to the acoustics before extracting measurements [1] while others report no such filter [2] . Some common phonetic measurements of fricatives, particularly, spectral moments [3] are particularly sensitive to whether the data has been high pass filtered, which can make these measurements difficult to compare across studies employing different filtering techniques, including no high pass filter. Other types of spectral measurements, such as the maximum frequency in the signal or the amplitude of the maximum frequency, which are also used to describe fricatives [4 , 5] , are likely robust to filtering. This article presents spectral measurements with and without a high pass filter at 300 Hz, illustrating the sensitivity to this filter of phonetic measurements commonly used to describe fricatives.
The figures below compare measurements with and without the filter, across the entire dataset. The csv file submitted with the article, described in the supplementary materials section, is coded for specific subsets of the data designed to address different aspects of the pattern (see [6] ). In addition to the csv file containing measurements of all tokens, the supplementary materials include a Praat [7] script for extracting the measurements with and without the filter and an R [8] script for outlier extraction and figure generation.

Data overview
This section presents figures of six spectral measurements of non-coronal fricatives, spectrum center of gravity (CoG), Fig. 1 , spectrum standard deviation (SD), Fig. 3 , spectrum skew, Fig. 4 , spectrum kurtosis, Fig. 5 , maximum amplitude frequency, Fig. 6 , maximum amplitude, Fig. 7 . For comparison, each measure is presented with and without a 300Hz high pass filter. The figures show that the filter impacts the measurements relative to spectral moments, including CoG, SD, skew and kurtosis, but not maximum amplitude frequency or maximum amplitude. We also present CoG by speaker gender in Fig. 2 . Fig. 1 compares spectrum Center of Gravity (CoG), with and without a high pass filter at 300 Hz. The top panels show the distribution of CoG values by fricative. The distribution for the /x/ fricative is centered at low frequency. Most of the values are below 5,0 0 0 Hz, regardless of filtering. The /f/ fricative shows a wider range of variation. The distribution for /f/ is bimodal, with a peak at high frequency, approximately 7,0 0 0 Hz, and another at low frequency, overlapping with the /x/ distribution. The high pass filter has the effect of raising CoG measurements somewhat. This can be seen in the kernal density plots in the top panels as well as the scatter plot in the bottom panel. The bottom panel plots the unfiltered CoG measurements (x-axis) against the filtered measurements (y-axis), by speaker. Since there is no probability mass below 300 Hz in the filtered data, the CoG values are shifted upwards, although the degree of the shift varies across both speakers and fricatives. Tokens that fall on the diagonal are unaffected by the filter. Tokens above the diagonal have a higher CoG in the filtered data than in the unfiltered data. The degree of this difference is variable, but appears to be larger for /f/ than for /x/. Fig. 2 shows the distribution of CoG by gender across the entire data set, both /f/ and /x/ tokens. The distribution of CoG values for both genders is bimodal. The lower frequency mode is centered on a similar frequency across genders. The higher frequency mode shows some gender differences-the peak is higher frequency for males than for females. Fig. 3 compares spectrum Standard Deviation (SD) across the entire dataset with and without the high pass filter. The top panels show density plots by fricative. The /x/ fricative has a low SD. The distribution of SD for /f/ is bimodal, at least in the unfiltered data (top, left panel). The larger peak of the distribution is centered around 4,500 Hz. The smaller peak of the /f/ distribution overlaps with /x/. The effect of filtering on spectrum SD varies across token, as can be seen in the lower panel, which plots unfiltered SD (x-axis) against filtered SD (y-axis) by speaker. For some speakers, filtering has the effect of raising spectrum SD. This is most clear for S07 and, to a lessor degree, S01 and S03. Filtering energy at low frequencies can increase the imbalance in amplitude across the spectrum. However, some tokens show decreased SD with filtering. This is the tendency, in particular, for S09 and S10. Thus, unlike CoG, the effect that filtering has on SD is not uniform across speakers. It depends in part on the speaker-specific distribution of energy across the spectrum. Fig. 4 shows the spectrum skew by fricative. The distribution for both fricatives is monomodal. The /f/ shows a lower and sharper peak in its distribution than /x/. The effect of the filter on this measurement is somewhat inconsistent. There is an overall tendency for the filter to lower skew, particularly for /f/. This is indicated by the number of tokens below the diagonal in scatter plot shown in the bottom panel of Fig. 4 . However, there are also some tokens that are above the diagonal, indicating a raising effect of the high pass filter on skew.   5 shows spectrum kurtosis by fricative. Kurtosis is similar for /f/ and /x/, but the distribution of values is more consistently low for /f/ than for /x/, as indicated by the density plots. The wider range of kurtosis values for /x/ is particularly salient for speakers 8,9, and 10 in the by speaker scatter plot (Fig. 5: bottom). The effect of filtering on kurtosis measures is typically to lower kurtosis; however, in some tokens filtering increases kurtosis. Fig. 6 shows the maximum amplitude frequency in the spectrum by fricative, with and without a 300 Hz high pass filter. The top panels show density plots and the bottom panel shows a scatter plot. The data points in the scatter plot fall on a straight line of slope 1, indicating that there is no effect of filter this measurement whatsoever. In terms of the difference in frequency by fricative, /x/ has a lower maximum frequency, typically around 2,0 0 0 Hz, with some higher values, likely due to individual differences. The maximum amplitude frequency for /f/, on the other hand, shows a bimodal distribution, with one mode around 8,0 0 0 Hz and the other overlapping heavily with /x/. This variation is also found in CoG measurements (Figure 1), which are discussed extensively in [6]. Fig. 7 shows the amplitude of the highest amplitude frequency by fricative with and without a high pass filter. As with the maximum amplitude frequency, there was no effect of filtering the data on the maximum amplitude measurement. This is illustrated by the identical distribution in the top panels and by the scatter plot on the bottom panel. There was a small effect of fricative. On average, the maximum amplitude for /x/ is somewhat greater than for /f/.
Figs. 1-7 , presented as an overview of the data, display a subset of the measurements. The figures present key spectral measurements at the fricative midpoint. The complete set of measurements, including those made at earlier and later timepoints in the fricatives, are available in the supplementary data file. The original sound files and textgrids are available on OSF [9].

Supplementary data
The supplementary data file includes the measurements reported above (spectrum CoG, spectrum SD, spectrum skew, spectrum kurtosis, maximum amplitude frequency, maximum amplitude) in Figs. 1-7 , sampled at different temporal intervals within the fricative. Measurements of fricative duration are also included in the data. The data are also coded for specific data subsets, described in [6] , as well as for vowel context, the category of the fricative (/x/, /f/), the tone of the syllable, and the proto-category of each fricative, according to reconstructions of Middle Chinese. This coding facilitates construction of additional subsets to further explore variation in the data. The data is coded for speaker and for speaker gender. Additional information about the speakers, including age and occupation is not coded in the data file but is provided in the description of the speakers in the methods below (see Table 1 ). Columns containing measurements calculated without the high pass filter end in "nf" for "no filter".
A description of the columns in the data file is as follows (1) filename-the filename of the wav file for the token; in the OSF arhcive, each token is a separate file. (2) syllable-unique number 1-90 for each item in the study; all items were monosyllabic.
The syllable ID number corresponds to the first column in Table 2 (below). (3) speaker-unique number 1-10 for each speaker in the study. The speaker number corresponds to the first column of Table 1 below. (4) gender-binary gender of the speaker, coded as "f" or "m".
fricative_IPA-coded as either "x" or "f" for the initial fricative that is expected to be produced in the word based on the description of Zhongjiang as a Type II merger variety of Southwest Mandarin; for more description of Mandarin merger types, see [6] . (12) tone_number-a single digit number, 1-4, representing the lexical tones, where: 1 is the high-rising tone; 2 is the mid-falling tone; 3 is the falling tone; and 4 is the falling-rising tone. (13) study1_items-indicates whether the item was included in the study1 subset, coded as "0" (not included) or "1" (included). (14) study2_items-indicates whether the item was included in the study2 subset, coded as "0" (not included) or "1" (included). (15) study3_items-indicates whether the item was included in the study3 subset, coded as "0" (not included) or "1" (included). (16) study4_items-indicates whether the item was included in the study4 subset, coded as "0" (not included) or "1" (included). (17) proto_category-indicates the category of the initial fricative in Middle Chinese. (18) max_amp_nf-the maximum amplitude of the highest amplitude frequency of the fricative spectrum sampled at the middle 20 ms of the fricative. This value was calculated without first high pass filtering the data. "nf" stands for "no filter".  (23) Kurtosis_mid_nf-the spectrum kurtosis calculated at the middle 20 ms of the fricative, without first high pass filtering the data. (24) COG1_nf-the spectrum center of gravity calculated at the first 20 ms of the fricative, without first high pass filtering the data. (25) SD1_nf-the spectrum standard deviation calculated at the first 20 ms of the fricative, without first high pass filtering the data. (26) Skew1_nf-the spectrum skew calculated at the first 20 ms of the fricative, without first high pass filtering the data. (27) Kurtosis1_nf-the spectrum kurtosis calculated at the first 20 ms of the fricative, without first high pass filtering the data. (28) COG2_nf-the spectrum center of gravity calculated at the second 20 ms of the fricative, without first high pass filtering the data. (29) SD2_nf-the spectrum standard deviation calculated at the second 20 ms of the fricative, without first high pass filtering the data.   ( continued on next page ) (63) Kurtosis3_nf-the spectrum kurtosis calculated at the penultimate 20 ms of the fricative, after high pass filtering the data. (64) COG4_nf-the spectrum center of gravity calculated at the final 20 ms of the fricative, after high pass filtering the data. (65) SD4_nf-the spectrum standard deviation calculated at the final 20 ms of the fricative, after high pass filtering the data. (66) Skew4_nf-the spectrum skew calculated at the final 20 ms of the fricative, after high pass filtering the data. (67) Kurtosis4_nf-the spectrum kurtosis calculated at the final 20 ms of the fricative, after high pass filtering the data.

Experimental Design, Materials and Methods
This section describes the experimental design and methods involved in developing a corpus of phonetic measurements of approximately 9,0 0 0 tokens of Zhongjiang Chinese non-coronal fricatives. Associated files include a spreadsheet of the measurements, the Praat scripts used to make the measurements, and an R script used for plotting the data, including some data preparation, coding and cleaning. The raw data, including wav files and segmentation (in the form of Praat Textgrids), are publicly available in a data repository [9] .

Speakers
All of the speakers were born and raised in urban Zhongjiang. Gender, age, and occupation are given in Table 1 .

Materials
The complete list of elicited words is provided in Table 2 . There are a total of 90 monosyllabic words beginning with /x/ or /f/. The words in the table include these fricatives in different environments, including those predicted to condition fricative merger [6] . The data subsets that entered into each of the four studies in [6] are indicated in column 6. These data subsets consist of word pairs. The words that form pairs for each study are indicated in the final column, with an index grouping minimal pairs in parentheses. For example, syllable 2 is (Standard Mandarin IPA [fa55], Zhongjiang IPA [fa31]) and was included in study 2, as indicated in the last column; it is paired in study 2 with syllable 20 'Ž'^(Standard Mandarin IPA [xwa35] Zhongjiang IPA [fa31]). Syllable 2 and syllable 20 are labelled as "Study2(1)" in the last column; the "(1)" indicates that they are the first minimal pair included in that study. In the accompanying data file, there are four columns which indicate which tokens were included in which studies. The columns are labelled: "Study1_items", "Study2_items", "Study3_items", and "Study4_items". Each row in these columns is coded as either "0", if the token is not included in the study or "1", if the token is included in the study.

Procedure
The word list above was recorded in a sound-attentuated environment as a part of a longer recording session. Details on the recording procedure are provided in [6] . Here we focus on a description of the data measurements.
The materials submitted with this paper include measurements from 8,991 wav files and corresponding Praat textgrids, indicating the segment boundaries for fricatives and vowels used to extract the measurements. The segment boundaries in the textgrids were determined by forced alignment, using the Montreal Forced Aligner [10] , after first checking the aligner performance against 100 hand-segmented items. The original sound files and the textgrids based on a Zhongjiang-trained forced aligner as well as the textgrids based on hand segmentation are publicly available at [9] .
A total of 9 tokens (0.1%) were excluded due to alignment failure. Of 9,0 0 0 tokens recorded, 8,991 are represented in the data file.
The measurements reported in the data file were extracted using Praat [7] with reference to the segment boundaries from forced alignment. Some spectral measurements were taken at five different timestamps in the target fricatives: the first 20 ms of the fricative, the second 20 ms of the fricative, the middle 20 ms of the fricative, the penultimate 20 ms time window and the final 20 ms of the fricative. The spectral measurements extracted at these time windows were: spectrum center of gravity (CoG), spectrum standard deviation (SD), spectrum skew, and spectrum kurtosis. In addition, the maximum amplitude frequency, and maximum amplitude of the maximum amplitude frequency were recorded at the middle 20 ms of the fricative. We include two Praat scripts with this submission. One extracts the measurements with no filtering in place. The other extracts the measurements after applying a high pass filter at 300 Hz. Both scripts extract spectral measurements based on the Nyquist frequency, 22,500 Hz, of the recordings.
The comparison of the measurements under these two analysis conditions (filter or no filter) is reported in Figs. 1-7 above. The two sets of measurements-with and without a high pass filter-are included in the data set submitted with this paper.
The R script included with the submission provides code for loading the data, removing outliers, and reproducing the plots in Fig. 1 -7 . We discarded extreme outliers, defined as tokens that were greater than three standard deviations from the mean of the high-pass-filtered spectrum CoG or the high-pass-filtered spectrum SD. This resulted in the loss of 35 tokens or 0.03% of the data. This is a slightly smaller number of outliers than are present if we calculated outliers based on the unfiltered data (there are 46 outliers, 0.05% of the data, in the unfiltered measurements).

Ethics
The research activities that generated the data, involving non-evasive recording of speech, are excepted from Xihua University IRB. All participants in the study gave informed consent to be recorded and to have the recordings analysed for academic (non-commercial) purposes.